home *** CD-ROM | disk | FTP | other *** search
Text File | 1991-04-16 | 125.8 KB | 2,980 lines |
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- -----------------------------
-
-
-
- INSIDE TURBO PASCAL 6.0 UNITS
-
-
-
- -----------------------------
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- by
-
- William L. Peavy
-
- -----------------
-
- Revised: April 16, 1991
-
-
-
-
-
-
- ABSTRACT
-
- If you want to know what is in a .TPU (unit) file produced
- by Version 6.0 of Turbo Pascal from Borland International,
- then this paper is for you. It doesn't explain quite
- everything since the I don't have access to secret documents
- or anything like that and since some of the data in .TPU
- files just doesn't have enough auxiliary information to make
- its role clear. However, it is possible to learn a great
- deal about how Turbo Pascal organizes the information it
- needs to refer to, and it is also possible to learn just
- what kind of code the compiler produces.
-
- This is the third in a series of reports on the subject of
- Turbo Pascal Units, the first treating with Turbo Pascal
- Version 5.0 and the second with Turbo Pascal 5.5. The
- evolution of these files in the face of changing
- requirements has been fascinating to behold and deciphering
- their contents has been challenging to say the least.
-
- The programs supplied with this report have been reorganized
- from their 5.5 style in some ways and many identifiers have
- been changed. These changes were more for style than for
- substance. Other changes were dictated by the changes in
- the organization of the TPU file itself and certain errors
- in the 5.5 programs have been corrected. In addition, other
- errors of interpretation have been fixed which has led to
- some enhanced descriptive capability.
-
- Since I have a "real" job which requires my full attention,
- and since it doesn't involve use of these products in any
- direct way, I am usually hard-pressed to find the personal
- time to conduct this research. Consequently, I always
- refuse to commit to follow-up or even error correction. It
- would be irresponsible of me to pretend it could be
- otherwise. Even so, this is a revised report which contains
- a few error fixes and discusses the newly enhanced program
- which incorporates these fixes and sports some enhanced
- capabilities.
-
-
-
- Contents
-
-
-
- Introduction ................................................. 5
-
- 1. Gross File Structure ...................................... 5
- 1.1 User Units ........................................... 6
-
- 2. Locators .................................................. 7
- 2.1 Local Links .......................................... 7
- 2.2 Global Links ......................................... 7
- 2.3 Table Offsets ........................................ 7
- 2.4 Basic Relationships .................................. 8
-
- 3. Unit Header .............................................. 11
- 3.1 Description ......................................... 11
- 3.2 UNIT Size ........................................... 14
-
- 4. Symbol Dictionaries ...................................... 14
- 4.1 Organization ........................................ 14
- 4.2 Interface Dictionary ................................ 14
- 4.3 Debug Dictionary .................................... 15
- 4.4 Dictionary Elements ................................. 15
-
- 4.4.1 Hash Tables ................................... 15
- 4.4.1.1 Size .................................... 16
- 4.4.1.2 Scope ................................... 16
- 4.4.1.3 Special Cases ........................... 17
-
- 4.4.2 Dictionary Headers ............................ 17
-
- 4.4.3 Dictionary Stubs .............................. 18
- 4.4.3.1 Label Declaratives ("O") ................ 18
- 4.4.3.2 Un-Typed Constants ("P") ................ 18
- 4.4.3.3 Named Types ("Q") ....................... 18
- 4.4.3.4 Variables, Fields, Typed Cons ("R") ..... 19
- 4.4.3.5 Subprograms & Methods ("S") ............. 20
- 4.4.3.6 Turbo Std Procedures ("T") .............. 21
- 4.4.3.7 Turbo Std Functions ("U") ............... 21
- 4.4.3.8 Turbo Std "NEW" Routine ("V") ........... 21
- 4.4.3.9 Turbo Std Port Arrays ("W") ............. 21
- 4.4.3.10 Turbo Std External Variables ("X") ..... 21
- 4.4.3.11 Units ("Y") ............................ 22
-
- 4.4.4 Type Descriptors .............................. 22
- 4.4.4.1 Scope ................................... 23
- 4.4.4.2 Prefix Part ............................. 23
- 4.4.4.3 Suffix Parts ............................ 24
- 4.4.4.3.1 Un-Typed .......................... 25
- 4.4.4.3.2 Structured Types .................. 25
- 4.4.4.3.2.1 ARRAY Types ................. 25
- 4.4.4.3.2.2 RECORD Types ................ 25
- 4.4.4.3.2.3 OBJECT Types ................ 26
- 4.4.4.3.2.4 FILE (non-TEXT) Types ....... 27
- 4.4.4.3.2.5 TEXT File Types ............. 27
- 4.4.4.3.2.6 SET Types ................... 27
-
-
-
- - iii -
-
-
-
- Contents
-
-
- 4.4.4.3.2.7 POINTER Types ............... 27
- 4.4.4.3.2.8 STRING Types ................ 27
- 4.4.4.3.3 Floating-Point Types .............. 27
- 4.4.4.3.4 Ordinal Types ..................... 28
- 4.4.4.3.4.1 "Integers" .................. 28
- 4.4.4.3.4.2 BOOLEANs .................... 28
- 4.4.4.3.4.3 CHARs ....................... 28
- 4.4.4.3.4.4 ENUMERATions ................ 29
- 4.4.4.3.5 SUBPROGRAM Types .................. 29
-
- 5. Maps and Lists ........................................... 30
- 5.1 PROC Map ............................................ 30
- 5.2 CSeg Map ............................................ 31
- 5.3 Typed CONST DSeg Map ................................ 31
- 5.4 Global VAR DSeg Map ................................. 32
- 5.5 Donor Unit List ..................................... 32
- 5.6 Source File List .................................... 33
- 5.7 DEBUG Trace Table ................................... 34
-
- 6. Code, Data, Fix-Up Info .................................. 35
- 6.1 Object CSegs ........................................ 35
- 6.2 CONST DSegs ......................................... 35
- 6.3 Fix-Up Data Table ................................... 36
-
- 7. Supplied Program ......................................... 37
-
- 7.1 TPU6 ................................................ 37
- 7.1.1 UNIT TPU6AMS .................................. 37
- 7.1.2 UNIT TPU6EQU .................................. 38
- 7.1.3 UNIT TPU6UTL .................................. 38
- 7.1.4 UNIT TPU6RPT .................................. 38
- 7.1.5 UNIT TPU6UNA .................................. 38
-
- 7.2 Modifications ....................................... 39
-
- 7.3 Notes on Program Logic .............................. 39
- 7.3.1 Formatting the Dictionary ..................... 39
- 7.3.2 The Disassembler .............................. 41
-
- 8. Unit Libraries ........................................... 43
- 8.1 Library Structure ................................... 43
-
- 9. Application Notes ........................................ 44
-
- 10. Acknowledgements ........................................ 45
-
- 11. References .............................................. 46
-
- INDEX ....................................................... 47
-
-
-
-
-
-
-
-
- - iv -
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- INTRODUCTION
-
-
- This document is the outcome of an inquiry conducted into the
- structure and content of Borland Turbo Pascal (Version 6.0) Unit
- files. The original purpose of the inquiry was to provide a body of
- theory enabling Cross-Reference programs to resolve references to
- symbols defined in .TPU files where qualification was not explicitly
- provided. As is so often the case, one thing led to another and the
- scope of the inquiry was expanded dramatically. While this document
- should not be regarded as definitive, the author feels that the entire
- Turbo Pascal User community might gain from the information extracted
- from these files at the cost of so much time and effort.
-
- The material contained herein represents the findings and
- interpretations of the author. A great deal of guess-work was
- required and no assurances are given as to the accuracy of either the
- findings of fact or the inferences contained herein which are the sole
- work-product of the author. In particular, the author had access only
- to materials or information that any normal Borland customer has
- access to. Further, no Borland source-codes were available as the
- Library Routine source is not licensed to the author. In short, there
- was nothing irregular about how these findings were achieved.
-
- The material contained herein is placed in the public domain free of
- copyright for use of the general public at its own risk. The author
- assumes no liability for any damages arising from the use of this
- material by others. If you make use of this information and you get
- burned, TOUGH! The author accepts no obligation to correct any such
- errors as may exist in the supplied programs or in the findings of
- fact or opinion contained herein. On the other hand, this is not a
- "complete" work in that a great many questions remain open, especially
- as regards fine details. (The author is not highly-qualified in Intel
- 80xxx Assembly Language and several open questions might best be
- addressed by persons competent in this area.) The author welcomes the
- input of interested readers who might be able to "flesh-out" some of
- these open questions with "hard" answers.
-
-
- 1. GROSS FILE STRUCTURE
-
-
- A Turbo Pascal Unit file consists of an array of bytes that is some
- exact multiple of sixteen (16). "Signature" information allows the
- compiler to verify that the .TPU file was compiled with the correct
- compiler version and to verify that the file is of the correct size.
- The fine structure of the file will be addressed in later sections at
- ever increasing levels of detail.
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 5
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- Graphically, the file may be regarded as having the following general
- layout:
-
- +-------------------+
- | Unit Header | Main Index to Unit File
- |-------------------|
- | Dictionaries: |
- | a) Interface |
- | b) Debug * | For Local Symbol Access
- |-------------------|
- | PROC Map |
- |-------------------|
- | CSeg Map * | May be Empty
- |-------------------|
- | CONST DSeg Map * | May be Empty
- |-------------------|
- | VAR DSeg Map * | May be Empty
- |-------------------|
- | Donor Units * | May be Empty
- |-------------------|
- | Source Files |
- |-------------------|
- | Trace Table * | May be Empty
- |-------------------|
- | CODE Segment(s) * | May be Empty
- |-------------------|
- | DATA Segment(s) * | May be Empty
- |-------------------|
- | FIX-UP Data * | May be Empty
- +-------------------+
-
-
- 1.1 USER UNITS
-
-
- Units prepared by the compiler available to ordinary users have a very
- straight-forward appearance and content. There may even be a little
- "wasted" space that might be removed if the compiler were just a
- little cleverer. The SYSTEM.TPU file is quite another thing however.
-
- The SYSTEM.TPU file (found in TURBO.TPL) is extraordinary in that
- great pains seem to have been taken to compact it. Further, it
- contains a great many types of entries that just don't seem to be
- achievable by ordinary users and I suspect that much (if not all) of
- it was "hand-coded" in Assembler Language.
-
- In the following sections, the details of these optimizations will be
- explained in the context of the structural element then under
- discussion.
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 6
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- 2. LOCATORS
-
-
- The data in these files has need of structure and organization to
- support efficient access by the various programs such as the compiler,
- the linker and the debugger. This organization is built on a solid
- foundation of locators employed in the unit's data structures.
-
-
-
- 2.1 LOCAL LINKS
-
-
- Local Links (LL's) are items of type WORD (2 bytes) which contain an
- offset which is relative to the origin of the unit file itself. This
- implies that a unit must be somewhat less than 64K bytes in size. If
- the .TPU file is loaded into the heap, then an LL can be used to
- locate any byte in the segment beginning with the load point of the
- file.
-
-
-
- 2.2 GLOBAL LINKS
-
-
- Global Links (LG's) are used to locate type descriptors and to locate
- allocation data for variables with the ABSOLUTE attribute which may
- reside in other Units (i.e., units external to the present unit).
- LG's are structured items consisting of two (2) words. The first of
- these is an LL that is relative to the origin of the (possibly)
- external unit. It locates either a Type Descriptor or the stub of the
- Dictionary entry which establishes storage allocation. The second
- word is an LL which locates the stub of the unit entry in the current
- unit dictionary for the (possibly) external unit. This dictionary
- entry provides the name of the unit that contains the item the LG
- points to.
-
- This provides a handy mechanism for locating type descriptors and
- allocation information which may be defined in other separately
- compiled units.
-
-
-
- 2.3 TABLE OFFSETS
-
-
- Finally, various data-structures within a .TPU file are organized as
- arrays of fixed-length records or as lists of variable-length records.
- Efficient access to such records is achieved by means of offsets
- rather than subscripts (an addressing technique denied Pascal). These
- offsets are relative to the origin of the array or list being
- referenced rather than the origin of the unit.
-
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 7
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- 2.4 BASIC RELATIONSHIPS
-
-
- +-------------+ +----------------------+
- | Unit | | INTERFACE Dictionary |
- | Header | | |
- +-------------+ | Public and Private |
- | | Names, Nested Hash |
- | LL +----------------+ LL's | Tables, INLINE code, |
- |-------->| INTERFACE Hash |------->| Type Descriptors. |
- | +----------------+ +----------------------+
- | (LL's ^ & LG's)
- | +----------------------+
- | LL +----------------+ LL's | DEBUG Dictionary |
- |-------->| DEBUG Hash |------->| IMPLEMENTATION and |
- | +----------------+ | nested scope names, |
- | ?| stored for DEBUG. |
- | LL +----------------+ | Same structure as in |
- |-------->| PROC Map Table | | INTERFACE. Linked |
- | +----------------+ | to INTERFACE part by |
- | LL +----------------+ | LL's. BUILT ONLY IF |
- |-------->| CSeg Map Table |? | LOCAL SYMBOLS ARE |
- | +----------------+ | ENABLED AT COMPILE. |
- | LL +----------------+ +----------------------+
- |-------->| DSeg Map CONST |?
- | +----------------+
- | LL +----------------+
- |-------->| DSeg Map VAR's |?
- | +----------------+ IMPORTANT NOTES
- | LL +----------------+ ----------------------
- |-------->| Donor Unit List|? Some of the structures
- | +----------------+ shown in this figure
- | LL +------------------+ are built only if they
- |-------->| Source File List | are needed. These are
- | +------------------+ marked by a "?" next
- | LL +------------------+ to the box.
- |-------->| Debug Step Ctls |?
- | +------------------+ If the DEBUG Dictionary
- | ** +---------------+ is missing, its LL
- |-------->| CODE Segments |? leads directly to the
- | +---------------+ INTERFACE Dictionary.
- | ** +-----------------+ ----------------------
- |-------->| CONST DATA Segs |?
- | +-----------------+
- | ** +----------------+
- +-------->| Fix-Up Lists |?
- +----------------+
-
-
-
- This figure illustrates the role of the Unit Header in tying together
- the various data structures in the Unit. The type of link is shown
- next to a flow-line by "LL", "LG" or "**". "LL" and "LG" are explicit
- pointers while "**" shows a locator whose value is computed using
- other data in the Unit Header and that no explicit pointer exists.
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 8
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
-
- +----(from hash tables,other Dictionary Entries)
- |
- | +------------------------------------------------+
- | | Header Part | Stub Part -- many formats |
- +--->| - - - - - - | - - - +------------------------- |
- | | data, | Some stubs have embedded | Dictionary
- | Name, Class | links | Type Descriptors | Entry
- | and link to | (see | +------------------- |
- | entries who | below)| | INLINE Declarative |
- | have same | * | | code bytes for a |
- | hash | | | | "macro" type PROC |
- +-----------------|------------------------------+
- +----------+
- |
- | FAR pntr +----------------------------+
- |----------->| Absolute Memory Locations |
- | +----------------------------+
- | +-----------------------------+
- | LG's | Type Descriptors and stubs |
- |----------->| of Dictionary Entries used |
- | | for absolute equivalences |
- | +-----------------------------+
- | +---------------------------------+
- | LL's | Nested Scope Hash Tables |
- |----------->| Parent Scope Dictionary Entries |
- | | Record Fields |
- | | Object Fields/Methods |
- | +---------------------------------+
- | +----------------------+
- | Offsets | CONST DSeg Map Table |
- +----------->| PROC Map Table |
- | VAR DSeg Map Table |
- +----------------------+
-
-
-
- This figure illustrates the many types of entities that associate with
- Dictionary Entries and particularly with their Stub Parts. Not all of
- the links shown occur in a single Stub format, but all of the links in
- the figure can and do exist in selected cases. The purpose here is to
- show the flexibility of the system of links in associating required
- data with the Dictionary Entry and its identifying symbol.
-
- While it may not be apparent from the figure, the dictionary structure
- as a whole may be viewed as a cyclic directed graph which is rooted in
- the DEBUG Hash Table. The recursive properties exhibited by the node
- relationships permit direct support of the scope rules of Turbo Pascal
- with simplicity and elegance. As one might expect, the representation
- of the required information lends itself to efficient use of storage
- since the representations are compact and there is very little in the
- way of redundancy. The small amount of redundancy that does exist is
- apparently aimed at speeding access to certain structures by the Turbo
- components (compiler, linker and debugger).
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 9
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
-
- +----(implied links, explicit LG's from other structures)
- |
- | +---------------------------------------------+
- | | Flags and codes, allocation widths for data | Type
- +--->| and VMT's, subrange constraints, formal | Descriptor
- | parameter descriptors, implicit associated | Contents &
- | type descriptors, LL's, LG's and Offsets. | Linkages
- +---------------------------------------------+
- |
- |
- | LG's +------------------+
- |-------------->| Type Descriptors |
- | +------------------+
- |
- | +-------------------------------+
- | LL's | Method Dictionary Entries |
- |-------------->| Nested Scope Hash Tables |
- | | Nested Scope Field Chains |
- | | Parent Scope Dictionary Entry |
- | +-------------------------------+
- |
- | Offsets +----------------------------------+
- +-------------->| VMT pointers in Object Instances |
- | CONST DSeg Map Table Entries |
- +----------------------------------+
-
-
- This figure illustrates the relationships between Type Descriptors and
- other structures in the dictionary. Not all the links shown can exist
- with a single Type Descriptor since there are several variant forms of
- these descriptors (depending on base type) but in combination, these
- linkages are feasible. In addition to links, a great amount of data
- is stored which is peculiar to a given type declaration. Descriptors
- can be -- and are -- shared. Indeed, they were designed with that in
- mind. Once a named type is declared, all entities that reference it
- are linked to it in some way (usually by an LG).
-
- Almost every form of type descriptor is found in the SYSTEM unit and
- this fact is used to advantage. When un-typed constants are declared,
- a built-in type descriptor is referenced (via an LG) which provides
- necessary information for maintenance of orderly dictionary structure.
- When a named-type is declared, it is almost always decomposed into an
- expression based on the built-in types of Turbo Pascal which are found
- in the SYSTEM unit with the aid of an LG.
-
- The semantics underlying the idea of the Unit mandate this very
- approach since program modules of any class which make references to
- units for definitions use the definitions as implemented by the unit
- which contains them. Re-defining the unit or any of its defined types
- leads to a natural requirement to re-compile those program modules
- which rely on the unit for definitions. The impact is fundamental
- since the storage representation of a unit-defined named type can
- change in quite radical ways.
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 10
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
-
-
- 3. UNIT HEADER
-
-
- The Unit Header comprises the first 64 bytes of the .TPU file. It
- contains LL's that effectively locate all other sections of the .TPU
- file plus statistics that enable a little cross-checking to be
- performed. Some parts of the Unit Header appear to be reserved for
- future use since no unit examined by this author has ever contained
- non-zero data in these apparently reserved fields.
-
-
-
- 3.1 DESCRIPTION
-
-
- The Unit Header provides a high-level locator table whereby each major
- structure in the unit file can be addressed. The following provides a
- Pascal-like explanation of the layout of the header followed by
- further narrative discussion of the contents of the individual fields
- in the Unit Header.
-
- Type HdrAry = Array[0..3] of Char; LL = Word;
-
- UnitHeader = Record
-
- UHEYE : HdrAry; { +00 : = 'TPU9' }
- UHxxx : HdrAry; { +04 : = $00000000 }
- UHUDH : LL; { +08 : to Dictionary Head-This Unit }
- UGIHT : LL; { +0A : to Hash Table (INTERFACE) }
- UHPMT : LL; { +0C : to PROC Map }
- UHCMT : LL; { +0E : to CSeg Map }
- UHTMT : LL; { +10 : to DSeg Map-Typed CONST's }
- UHDMT : LL; { +12 : to DSeg Map-GLOBAL Variables }
- UHxxy : LL; { +14 : Purpose Unknown }
- UHLDU : LL; { +16 : to Donor Unit List }
- UHLSF : LL; { +18 : to Source file List }
- UHDBT : LL; { +1A : to Debug Trace Step Controls }
- UHENC : LL; { +1C : to end non-code part of Unit }
- UHZCS : Word; { +1E : Size of CSEGs (aggregate) }
- UHZDT : Word; { +20 : Size of Typed Constant Data }
- UHZFA : Word; { +22 : Fix-Up Bytes (CSegs) }
- UHZFT : Word; { +24 : Fix-Up Bytes (Typed CONST's) }
- UHZFV : Word; { +26 : Size of GLOBAL VAR Data }
- UHDHT : LL; { +28 : to Hash Table (DEBUG) }
- UHSOV : Word; { +2A : Overlay Involved if non-zero }
- UHPad : Array[0..9]
- of Word; { +2C : Reserved for Future Expansion }
-
- End; { UnitHeader }
-
- UHEYE contains the characters "TPU9" in that order. This is
- clear evidence that this unit was compiled by Turbo Pascal
- Version 6.0.
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 11
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- UHxxx is apparently reserved and contains binary zeros.
-
- UHUDH contains an LL (WORD) which points to the Dictionary
- Header in which the name of this unit is found.
-
- UHIHT contains an LL (WORD) which points to a Hash table that is
- the root of the Interface Dictionary graph.
-
- UHPMT contains an LL (WORD) which points to the PROC Map for
- this unit. The PROC Map contains an entry for each
- Procedure or Function declared in the unit (except for
- INLINE types), plus an entry for the Unit Initialization
- section. The length of the PROC Map (in bytes) is
- determined by subtracting this UHPMT from UHCMT.
-
- UHCMT contains an LL (WORD) which points to the CSeg (CODE
- Segment) Map for this unit. The CSeg Map contains an
- entry for each CODE Segment produced by the compiler plus
- an entry for each of the CODE Segments included via the
- {$L filename.OBJ} compiler directive. The length of this
- Map (in bytes) is obtained by subtracting UNCMT from
- UHTMT. The result may be zero in which case the CSeg Map
- is empty.
-
- UHTMT contains an LL (WORD) which points to the DSeg (DATA
- Segment) Map that maps the initializing data for Typed
- CONST items plus templates for VMT's (Virtual Method
- Tables) that are associated with OBJECTS which employ
- Virtual Methods. The length of this Map (in bytes) is
- obtained by subtracting UHTMT from UHDMT. The result may
- be zero in which case this DSeg Map is empty.
-
- UHDMT contains an LL (WORD) which points to the DSeg (DATA
- Segment) Map that contains the specifications for DSeg
- storage required by VARiables whose scope is GLOBAL. The
- length of this Map (in bytes) is obtained by subtracting
- UHDMT from UHxxy. The result may be zero in which case
- this DSeg Map is empty.
-
- UHxxy Purpose of this word is unknown. No non-zero values have
- ever been observed here. (May be for TP-Windows?)
-
- UHLDU contains an LL (WORD) which points to a table of units
- which contribute either CODE or DATA Segments to the .EXE
- file for a program using this Unit. This is called the
- "Donor Unit Table". The length of this table (in bytes)
- is obtained by subtracting UHLDU from the word UHLSF. The
- result may be zero in which case this table is empty.
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 12
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- UHLSF contains an LL (WORD) which points to a list of "source"
- files. These are the files whose CODE or DATA Segments
- are included in this Unit by the compiler. Examples are
- the Pascal Source for the Unit itself, plus the .OBJ files
- included via the {$L filename.OBJ} compiler directive.
- The length of this table (in bytes) is obtained by
- subtracting UHLSF from the word UHDBT. The result may be
- zero in which case this table is empty.
-
- UHDBT contains an LL (WORD) which points to a Trace Table used
- by the DEBUGGER for "stepping" through a Function or
- Procedure contained in this Unit. The length of this
- table (in bytes) is obtained by subtracting UHDBT from the
- word UHENC. The result may be zero in which case this
- table is empty.
-
- UHENC contains an LL (WORD) which points to the first free byte
- which follows the Trace Table (if any). It serves as a
- delimiter for determining the size of the Trace Table.
- This LL (when rounded up to the next integral multiple of
- 16) serves to locate the start of the code/data segments.
-
- UHZCS is a WORD that contains the total byte count of all CODE
- Segments compiled into this Unit.
-
- UHZDT is a WORD that contains the total byte count of all Typed
- CONST and VMT DATA Segments compiled into this unit.
-
- UHZFA is a WORD that contains the total byte count of the Fix-Up
- Data Table for this unit for CODE (CSegs).
-
- UHZFT is a WORD that contains the total byte count of the Fix-Up
- Data Table for Typed CONST's. This usually implies that a
- VMT is getting its pointers relocated.
-
- UHZFV is a WORD that contains the total byte count of all GLOBAL
- VAR DATA Segments compiled into this unit.
-
- UHDHT contains an LL (WORD) which points to a Hash Table which
- is the root of the DEBUGGER Dictionary. If Local Symbols
- were generated by the compiler (directive {$L+}) then ALL
- symbols declared in the unit can be accessed from this
- Hash Table. If Local Symbols were suppressed there is no
- such Dictionary and the LL stored here points to the
- INTERFACE Dictionary.
-
- UHSOV Purpose of this word is unknown. It has been observed to
- be non-zero when overlay directives are used. So far
- however, this hasn't enabled me to come up with a good
- guess as to just what the observed values actually mean.
-
- UHPad begins a series of ten (10) words that are apparently
- reserved for future use. Nothing but zeros have ever been
- seen here by this author.
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 13
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
-
- 3.2 UNIT SIZE
-
-
- An independent check on the size of the .TPU file is available using
- information contained in the Unit Header. This is also important for
- .TPL (Unit Library) organization. To compute the file :size, refer to
- the five (5) words -- UHENC, UHZCS, UHZDT, UHZFA, and UHZFT. Round
- the contents of each of these words to the lowest multiple of 16 that
- is greater than or equal to the content of that word. Then form the
- sum of the rounded words. This is the .TPU file size in bytes.
-
-
- 4. SYMBOL DICTIONARIES
-
-
- This area contains all available documentation of declared symbols and
- procedure blocks defined within the unit. Depending on compiler
- options in effect when the unit was compiled, this section will
- contain at a minimum, the INTERFACE declarations, and at a maximum,
- ALL declarations. The information stored in the dictionary is highly
- dependent on the context of the symbol declared. We defer further
- explanation to the appropriate section which follows.
-
-
- 4.1 ORGANIZATION
-
-
- A dictionary is organized with a Hash Table as its root. The hash
- table is used to provide rapid access to identifiers.
-
- A dictionary may be thought of as a directed graph. Each subgraph is
- rooted in a hash table. There may be a great many hash tables in a
- given unit and their number depends on unit complexity as well as the
- options chosen when the unit was compiled. Use of the {$L+} directive
- produces the largest dictionaries. The hash tables are explained in
- detail a few sections further on.
-
- Hash tables point to Dictionary Headers. When two or more symbols
- produce the same hash function result, a collision is said to occur.
- Collisions are resolved by the time-honored method of chaining
- together the Dictionary Headers of those symbols having the same hash
- function result. Dictionary supersetting is accomplished using these
- chains.
-
-
- 4.2 INTERFACE DICTIONARY
-
-
- The INTERFACE dictionary contains all symbols and the necessary
- explanatory data for the INTERFACE section of a Unit. Symbols get
- added to the Unit using increasing storage addresses until the
- IMPLEMENTATION section is encountered.
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 14
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- 4.3 DEBUG DICTIONARY
-
-
- The Debug dictionary (if present) is a superset of the INTERFACE
- dictionary. It is used by the Turbo Debugger to support its many
- features when tracing through a unit. If present, this dictionary is
- rooted in its own hash table. The hash table is effectively
- initialized when the IMPLEMENTATION keyword is processed by the
- compiler. This takes the form (initially) of an unmodified copy of
- the INTERFACE hash table, to which symbols are added in the usual
- fashion. Thus, the hash chains constructed or extended at this time
- lead naturally to the INTERFACE chains and this is how the superset is
- effectively implemented.
-
-
- 4.4 DICTIONARY ELEMENTS
-
-
- The dictionary contains four major elements. These are: hash tables,
- Dictionary Headers, Dictionary Stubs and Type Descriptors. The
- distinction between Dictionary Headers and Stubs might appear to be
- rather arbitrary. They might just as easily be regarded as a single
- element (such as symbol entry). However, the case for the separate
- entity approach is strong since Stubs are DIRECTLY addressed via LG's
- and -- more to the point -- ONLY by LG's. Thus, it seems reasonable
- that this is a separate and very important structure -- at least in
- the minds of the architects at Borland.
-
-
- 4.4.1 HASH TABLES
-
-
- As has been intimated, Hash Tables are the glue that binds the
- dictionary entries together and gives the dictionary its "shape".
- They effectively implement the scope rules of the language and speed
- access to essential information.
-
- Each Hash table begins with a 2-byte size descriptor. This descriptor
- contains the number of bytes in the table proper (less 2). Thus, the
- descriptor directly points to the last bucket in the hash table. For
- a hash table of 128 bytes, the size descriptor contains 126. The
- first bucket in the table immediately follows the size descriptor.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 15
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- 4.4.1.1 SIZE
-
-
- So far, three different hash table sizes have been observed. The
- INTERFACE and DEBUG hash tables are usually 128 bytes (64 entries) in
- size plus 2 bytes of size description, but the SYSTEM.TPU unit is a
- special case, containing only 16 entries. Hash tables which anchor
- subgraphs whose scope is relatively local usually contain four (4)
- entries (8 bytes).
-
- Graphically, a Hash Table with four slots has the following layout:
-
- +--------------------+
- | 0006h | Size Descriptor
- |--------------------|
- | slot 0 | an LL or zero
- |--------------------|
- | slot 1 | an LL or zero
- |--------------------|
- | slot 2 | an LL or zero
- |--------------------|
- | slot 3 | an LL or zero
- +--------------------+
-
- It should be noted that the Size Descriptor furnishes an upper bound
- for the hash function itself. Thus, it seems possible that a single
- hash function is used for all hash tables and that its result is ANDed
- with the Size Descriptor to get the final result. Because the sizes
- are chosen as they are (powers of 2) this is feasible. Note that in
- the above example, 6 = 2 * (n - 1) where n = 4 {slot count}. All of
- the hash tables observed so far have this property.
-
- One final note on this subject. Given these properties, "Folding" of
- sparse hash tables is a rather trivial exercise so long as the new
- hash table also contains a number of slots that is a power of 2. This
- point is intriguing when one recalls that the SYSTEM.TPU hash table
- has only 16 slots rather than the usual 64.
-
-
-
- 4.4.1.2 SCOPE
-
-
- The INTERFACE and Debug dictionary hash tables are Global in Scope
- even though the symbols accessed directly via either hash table may be
- private. On the other hand, other hash tables are purely local in
- scope. For example, the fields declared within a record are reached
- via a small local hash table, as are the arguments and local variables
- declared within procedures and functions. Even OBJECTS use this
- technique to provide access to Methods and Object Fields.
-
- Access to such local scope fields/methods requires use of qualified
- names which ensures conformity to Pascal scope rules. The method is
- truly simple and elegant.
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 16
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
-
- 4.4.1.3 SPECIAL CASES
-
-
- The SYSTEM.TPU Unit is a special case. Its INTERFACE hash table has
- apparently been "hand-tuned" for small size and it contains only
- sixteen (16) entries. In addition, the Debug hash table is absent
- since there is no local symbol generation in this unit. Therefore,
- the Debug hash table does not exist as a separate entity, its function
- being served by the INTERFACE hash table. The pointer to the Debug
- hash table (in the Unit Header) has the same value as the pointer to
- the INTERFACE hash table.
-
-
- 4.4.2 DICTIONARY HEADERS
-
-
- This is the structure that anchors all information known by the
- compiler about any symbol. The format is as follows:
-
- +00: An LL which points to the next (previous) symbol in the
- same unit which had the same hash function value.
-
- +02: A character that defines the category the symbol belongs
- to and defines the format of the Dictionary Stub which
- follows the Dictionary Header. If the symbol is declared
- in the component list of the "private" part of an Object
- declaration, then this character is modified by adding $80
- to its ordinal value. Thus, an ordinary Function,
- Procedure or Method is of category "S" while a private
- Method is of category Chr(Ord('S')+$80).
-
- +03: A String (in the Pascal sense) of variable size that
- contains the text of the symbol (in UPPER-CASE letters
- only). The SizeOf function is not defined for these
- strings since they are truncated to match the symbol size.
- The "value" of the SizeOf function can be determined by
- adding 1 to the first byte in the string. Thus,
- Ord(Symbol[0])+1 is the expression that defines the Size
- of the symbol string. Turbo Pascal defines a symbol as a
- string of relatively arbitrary size, the most significant
- 63 characters of which will be stored in the dictionary.
- Thus, we conclude that the maximum size of such a string
- is 64 bytes.
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 17
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
-
- 4.4.3 DICTIONARY STUBS
-
-
- Dictionary Stubs immediately follow their respective headers and their
- format is determined by the category character in the Dictionary
- Header. The function of the stub is to organize the information
- appropriate to the symbol and provide a means of accessing additional
- information such as type descriptors, constant values, parameter lists
- and nested scopes. The format of each Stub is presented in the
- following sub-sections.
-
-
- 4.4.3.1 LABEL DECLARATIVES ("O")
-
-
- This Stub consists of a WORD whose function is (as yet) unknown.
-
-
- 4.4.3.2 UN-TYPED CONSTANTS ("P")
-
-
- This Stub consists of (2) two fields:
-
- +00: An LG which points to a Type Descriptor (usually in
- SYSTEM.TPU). This establishes the minimum storage
- requirement for the constant. The rules vary with the
- type, but the size of the constant data field (which
- follows) is defined using the Type Descriptor(s).
-
- +04: The value of the constant. For ordinal types, this value
- is stored as a LONGINT (size=4 bytes). For Floating-Point
- types, the size is implicit in the type itself. For
- String types, the size is determined from the length of
- the string which is stored in the initial byte of the
- constant.
-
-
- 4.4.3.3 NAMED TYPES ("Q")
-
-
- This Stub consists of an LG (4-bytes) that points to the Type
- Descriptor for this symbol.
-
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 18
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- 4.4.3.4 VARIABLES, FIELDS, TYPED CONS ("R")
-
-
- This Stub contains information required to allocate and describe these
- types of entities. The format and content is as follows:
-
- +00: A one-byte flag that precisely identifies the class of the
- item being described. The known values and their apparent
- meanings follow:
-
- $00 -> Global Variables (Allocated in DS);
- $01 -> Typed Constants (Allocated in DS);
- $02 -> Procedure LOCAL Variables on STACK;
- $03 -> Variables at Absolute Addresses;
- $06 -> ADDRESS Arguments allocated on STACK; (This is now
- used only for SELF in Method calls;)
- $08 -> Fields sub-allocated in RECORDS and OBJECTS, plus
- METHODS declared for OBJECTS.
- $10 -> Variable Equivalenced to another via the
- Absolute Clause;
- $22 -> Arguments whose VALUEs are passed on the stack;
- $26 -> Arguments whose ADDRESSes are passed on the stack.
-
- +01 Two words whose content vary with the codes above. Their
- content is explained following the last item in the stub.
-
- +05: An LG that locates the proper Type Descriptor for this
- symbol.
-
- When the code byte at +00 is $02,$06,$22 or $26 (arguments), the two
- words at +01 are used as follows:
-
- +01 Word -- Offset relative to either DS or BP.
- +03 Word -- LL to Dict Header of Parent Scope, or zero.
-
- If the code byte is $00 or $01 (VAR's or typed CONSTs), then we have:
-
- +01 Word -- Offset relative to allocation area origin;
- +03 Word -- Offset to entry in VAR/CONST Map for item
- allocation;
-
- When the code byte is $03 (Absolute Address Variable), then we have:
-
- +01 DWord -- FAR Pointer to Absolute Memory Address.
-
- When the code byte is $08 (Record/Object Fields/Methods), then we
- have:
-
- +01 Word -- Allocation Offset within Record/Object;
- +03 Word -- LL to next Field/Method.
-
- When the code byte is $10 (Absolute Equivalences), then we have:
-
- +01 DWord -- LG to STUB of variable/parameter declaration that
- actually establishes the allocation;
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 19
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
-
- 4.4.3.5 SUBPROGRAMS & METHODS ("S")
-
-
- Subprograms (PROC's), especially since Object Methods are supported,
- have a rather involved stub. Its format is as follows:
-
- +00: A byte that contains bit-switches that seem to describe
- the Call Model and imply the size of this stub. These
- switches determine what kind of code (if any) is generated
- when the PROC is referenced. The observed values are as
- follows:
-
- xxxxx001 -> PROC uses FAR Call Model;
- xxxx0010 -> PROC uses INLINE Model (no Call);
- xxxx0100 -> PROC uses INTERRUPT Model (no Call);
- xxxx100x -> PROC has EXTERNAL attribute;
- xxx1xxxx -> PROC uses METHOD Call Model;
- x011xxxx -> PROC is a CONSTRUCTOR Method;
- x101xxxx -> PROC is a DESTRUCTOR Method;
- 1xxxxxxx -> PROC has ASSEMBLER directive.
-
- +01 A byte whose function is not yet known. (TP Windows?)
-
- +02: A Word whose interpretation depends on whether or not we
- have an INLINE Declarative Subprogram. If this is an
- INLINE Declarative Subprogram, then this word contains the
- byte-count of the INLINE code text at the end of this
- stub. Otherwise, this word is the offset within the PROC
- Map that locates the object code for this Subprogram.
-
- +04: A Word that contains an LL which locates the containing
- scope in the dictionary, or zero if none.
-
- +06: A Word that contains an LL which locates the local Hash
- Table for this scope. A local hash table provides access
- to all formal parameters of the Subprogram as well as all
- Symbols whose declarations are local to the scope of this
- Subprogram.
-
- +08: A Word that is zero unless the symbol is a Virtual Method.
- In this case, then the content is the offset within the
- VMT for the owning object that defines where the FAR
- POINTER to this Virtual Method is stored.
-
- +0A: A complete Type-Descriptor for this Subprogram. The
- length is variable and depends upon the number of Formal
- Parameters declared in the header. (See 4.4.4.3.5).
-
- +??: If this Symbol represents an INLINE Declarative
- Subprogram, then the object-code text begins here. The
- byte-count of the text occurs at offset 0002h in this
- stub.
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 20
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- 4.4.3.6 TURBO STD PROCEDURES ("T")
-
-
- This Stub consists of two bytes, the first of which is unique for each
- procedure and increments by 4. I have found nothing in the SYSTEM
- unit (which is where this entry appears) that this seems directly
- related to. The second byte is always zero.
-
-
- 4.4.3.7 TURBO STD FUNCTIONS ("U")
-
-
- This Stub consists of two bytes, the first of which is unique for each
- function and increments by 4. I have found nothing in the SYSTEM unit
- (which is where this entry appears) that this seems directly related
- to. I wouldn't be surprised if this byte were an index into a TURBO
- compiler table that points to specialized parse tables/action routines
- for handling these functions and their non-standard parameter lists.
-
- The second byte seems to be a flag having the values $00, $40 and $C0.
- I strongly suspect that the flag $C0 marks exactly those functions
- which may be evaluated at compile-time. The meaning behind the other
- values is not known to me.
-
-
- 4.4.3.8 TURBO STD "NEW" ROUTINE ("V")
-
-
- This Stub consists of a WORD whose function is (as yet) unknown. This
- is the only Standard Turbo routine that can behave as a procedure as
- well as a function (returning a pointer value).
-
-
- 4.4.3.9 TURBO STD PORT ARRAYS ("W")
-
-
- This Stub consists of a byte whose value is 0 for byte arrays, and 1
- for word arrays.
-
-
- 4.4.3.10 TURBO STD EXTERNAL VARIABLES ("X")
-
-
- This Stub consists of an LG (4-bytes) that points to the Type
- Descriptor for this symbol.
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 21
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- 4.4.3.11 UNITS ("Y")
-
-
- Unit Stubs have the following content:
-
- +00: A Word whose apparently reserved for use by the Compiler
- or Linker.
-
- +02: A Word that seems to contain some kind of "signature" used
- to detect inconsistent Unit Versions. Borland calls this
- a "unit version number, which is basically a checksum of
- the interface part." I have seen a thread in CIS which
- says that it is a CRC value. Food for thought?
-
- +04: A Word that contains an LL which locates the Successor
- Unit in the "Uses" list. In fact, the "Uses" lists of
- both the INTERFACE and IMPLEMENTATION sections of the Unit
- are merged by this Word into a single list. A value of
- zero is used to indicate no successor.
-
- +06: A Word that contains an LL which locates the Predecessor
- Unit in the "Uses" list. For the SYSTEM unit entry, this
- value is always zero to indicate no predecessor. For the
- Unit being compiled, this LL locates the final Unit in the
- combined "Uses" list.
-
- In effect, the two LL's at offsets 0004 and 0006 organize the units
- into both forward and backward linked chains. The entry for the unit
- being compiled is effectively the head of both the forward and the
- backward chains. The final unit in the merged "Uses" list is the tail
- of the forward chain, and the SYSTEM unit is the tail of the backward
- chain.
-
-
- 4.4.4 TYPE DESCRIPTORS
-
-
- Type Descriptors store much of the semantic information that applies
- to the symbols declared in the unit. Implementation details can be
- managed using high-level abstractions and these abstractions can be
- shared.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 22
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- 4.4.4.1 SCOPE
-
-
- Type Descriptor sharing can occur across the boundaries which are
- implicit in unit modules. Thus, a type defined in one unit may be
- "imported" by some other module. Also, the pre-defined Pascal Types
- (plus the Turbo Pascal extensions) are defined in the SYSTEM.TPU unit
- and there needs to be a means of "importing" such Type Descriptors
- during compilation. This is precisely the objective of the LG locator
- which was described in section 2.2 (above). Type Descriptors are
- NEVER copied between units. The binding always occurs by reference at
- compile time and this helps support the technique of modifying a unit
- and compiling it to a .TPU file, then re-compiling all units/programs
- that "USE" it.
-
- Type Descriptors have many roles so their format varies. We have
- divided these structures into two parts: The PREFIX Part (which is
- always present and) whose format is fairly constant and the SUFFIX
- Part whose content and format depends on the attributes that are part
- of the type definition.
-
-
- 4.4.4.2 PREFIX PART
-
-
- The Prefix Part of every Type Descriptor consists of six (6) bytes.
- The usage is consistent for all types observed by this author and the
- format is as follows:
-
- +00: A Byte that identifies the format of the Suffix part.
- This is essentially based on several high-level categories
- which the Suffix Parts support directly. The observed set
- of values is as follows:
-
- 00h -> an un-typed entity;
- 01h -> an ARRAY type;
- 02h -> a RECORD type;
- 03h -> an OBJECT type;
- 04h -> a FILE type (other than TEXT);
- 05h -> a TEXT File type;
- 06h -> a SUBPROGRAM type;
- 07h -> a SET type;
- 08h -> a POINTER type;
- 09h -> a STRING type;
- 0Ah -> an 8087 Floating-Point type;
- 0Bh -> a REAL type;
- 0Ch -> a Fixed-Point ordinal type;
- 0Dh -> a BOOLEAN type;
- 0Eh -> a CHAR type;
- 0Fh -> an Enumerated ordinal type.
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 23
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- +01: A Byte used as a modifier. Since the above scheme is too
- general for machine-dependent details such as storage
- width and sign control, this modifier byte supplies
- additional data. The author has identified several cases
- in which this information is vital but has not spent very
- much time on the subject. The chief areas of importance
- seem to be in the 8087 Floating-Point types, and the
- Fixed-Point ordinal types. The semantics seem to be as
- follows:
-
- 0A 00 -> The type "SINGLE"
- 0A 02 -> The type "EXTENDED"
- 0A 04 -> The type "DOUBLE"
- 0A 06 -> The type "COMP"
-
- 0C 00 -> an un-named BYTE integer
- 0C 01 -> The type "SHORTINT"
- 0C 02 -> The type "BYTE"
- 0C 04 -> an un-named WORD integer
- 0C 05 -> The type "INTEGER"
- 0C 06 -> The type "WORD"
- 0C 0C -> an un-named double-word integer
- 0C 0D -> The type "LONGINT"
-
- One important feature of the above semantics is the fact
- that an un-typed CONST declaration refers to the above two
- bytes to determine the storage space needed in the
- dictionary for the data value of the constant. This can
- be a little involved however as the constant may contain
- its own length descriptor (as in a string) in which case
- it may be sufficient to identify the high-level type
- category without any modifier byte.
-
- +02: A Word that contains the number of bytes of storage that
- are required to contain an object/entity of this type.
- For types that represent variable-length objects/entities
- such as strings, this word may define the value returned
- by the SIZEOF function as applied to the type.
-
- +04 A Word that is zero unless the descriptor is for an Object
- Method. In this case, the content is an LL to the
- Dictionary Header of the SUCCEEDING Method for the Object,
- in order of declaration, or zero if none.
-
-
- 4.4.4.3 SUFFIX PARTS
-
-
- Suffix Parts further refine the implementation details of the type and
- also provide subrange constraints where appropriate. In some cases
- the Suffix part is empty since all semantic data for the type is
- contained in the Prefix part.
-
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 24
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- 4.4.4.3.1 UN-TYPED
-
-
- This Suffix Part is empty. Nothing is known about an un-typed entity.
-
-
- 4.4.4.3.2 STRUCTURED TYPES
-
-
- The structured types represent aggregates of lower-level types. We
- include ARRAY, RECORD, OBJECT, FILE, TEXT, SET, POINTER and STRING
- types in this category.
-
-
- 4.4.4.3.2.1 ARRAY TYPES
-
-
- The Suffix Part of the ARRAY type is so constructed as to be able to
- support recursive or nested definition of arrays. The suffix format
- is as follows:
-
- +00: An LG that locates the Type Descriptor for the "base-type"
- of the array. This is the type of the entity being
- arrayed (which may itself be an array).
-
- +04: An LG that locates the Type Descriptor for the array
- bounds which is a constrained ordinal type or subrange.
-
-
- 4.4.4.3.2.2 RECORD TYPES
-
-
- RECORD types have nested scopes. The Suffix part provides a base
- structure by which to locate the fields local to the scope of the
- Record type itself. The format is as follows:
-
- +00: A Word containing an LL which locates the local Hash Table
- that provides access to the fields in the nested scope.
-
- +02: A Word containing an LL which locates the Dictionary
- Header of the initial field in the nested scope. This
- supports a "left-to-right" traversal of the fields in a
- record.
-
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 25
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- 4.4.4.3.2.3 OBJECT TYPES
-
-
- OBJECT types also have nested scopes. The Suffix part provides a base
- structure by which to locate the fields and METHODS local to the scope
- of the OBJECT type itself. In addition, inheritance and VMT
- particulars are stored. The format is as follows:
-
- +00: A Word containing an LL which locates the local Hash Table
- that provides access to the fields and METHODS local to
- the nested scope.
-
- +02: A Word containing an LL which locates the Dictionary
- Header of the initial field or METHOD in the nested scope.
- This supports a "left-to-right" traversal of the fields
- and METHODS in an OBJECT.
-
- +04: An LG which locates the Type Descriptor of the Parent
- Object. This field is zero if there is no such Parent.
-
- +08: A Word which contains the size in bytes of the VMT for
- this Object. This field is zero if the object employs no
- Virtual Methods, Constructors or Destructors.
-
- +0A: A Word which contains the offset within the CONST DSeg Map
- that locates the VMT skeleton or template segment. This
- field equals FFFFh if the object employs no Virtual
- Methods, Constructors or Destructors.
-
- +0C: A Word which contains the offset within an Object instance
- where the NEAR POINTER to the VMT for the object is stored
- (within the DATA SEGMENT). This field equals FFFFh if the
- object employs no Virtual Methods, Constructors or
- Destructors.
-
- +0E: A Word which contains an LL which locates the Dictionary
- Header for the name of the OBJECT itself.
-
- +10: A Word (not yet understood) containing $FFFF.
-
- +12: Three Words (not yet understood) containing zeroes.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 26
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- 4.4.4.3.2.4 FILE (NON-TEXT) TYPES
-
-
- This Suffix consists of an LG that locates the Type Descriptor of the
- base type of the file. Note that the Type Descriptor may be that of
- an un-typed entity (for un-typed files).
-
-
- 4.4.4.3.2.5 TEXT FILE TYPES
-
-
- This Suffix consists of an LG that locates the Type Descriptor of the
- base type of the file -- in this case SYSTEM.CHAR.
-
-
- 4.4.4.3.2.6 SET TYPES
-
-
- This Suffix consists of an LG that locates the base-type of the set
- itself. Pascal limits such entities to simple ordinals whose
- cardinality is limited to 256.
-
-
- 4.4.4.3.2.7 POINTER TYPES
-
-
- This Suffix consists of an LG that locates the base-type of the entity
- pointed at.
-
-
- 4.4.4.3.2.8 STRING TYPES
-
-
- This is a special case of an ARRAY type. The format is as follows:
-
- +00: An LG to the Type Descriptor SYSTEM.CHAR which is the base
- type of all Turbo Pascal Strings.
-
- +04: An LG to the Type Descriptor for the array bounds
- constraints for the string. When the unconstrained STRING
- type is used, this points to SYSTEM.BYTE which is defined
- as a subrange 0..255.
-
-
- 4.4.4.3.3 FLOATING-POINT TYPES
-
-
- The Suffix part for all Floating-Point types is EMPTY. All data
- needed to specify these approximate number types is contained in the
- Prefix part. The Types included in this class are SINGLE, DOUBLE,
- EXTENDED, COMP and REAL.
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 27
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- 4.4.4.3.4 ORDINAL TYPES
-
-
- The Ordinal Types consist of the various "integer" types plus the
- BOOLEAN, CHAR and Enumerated types.
-
-
- 4.4.4.3.4.1 "INTEGERS"
-
-
- These types include BYTE, SMALLINT, WORD, INTEGER and LONGINT. Their
- Suffix parts are identical in format:
-
- +00: A double-word containing the LOWER bound of the subrange
- constraint on the type;
-
- +04: A double-word containing the UPPER bound of the subrange
- constraint on the type;
-
- +08: An LG that locates the Type Descriptor of the largest
- upward compatible type. This is the Type Descriptor that
- is used to control the width of an un-typed constant in
- the dictionary stub. For the "integer" types, this is an
- LG to SYSTEM.LONGINT.
-
-
- 4.4.4.3.4.2 BOOLEANS
-
-
- This type Suffix has the following format:
-
- +00: A double-word containing the LOWER bound of the subrange
- constraint on the type;
-
- +04: A double-word containing the UPPER bound of the subrange
- constraint on the type;
-
- +08: An LG that locates the Type Descriptor SYSTEM.BOOLEAN.
- There is no "upward compatible" type.
-
-
- 4.4.4.3.4.3 CHARS
-
-
- This type Suffix has the following format:
-
- +00: A double-word containing the LOWER bound of the subrange
- constraint on the type;
-
- +04: A double-word containing the UPPER bound of the subrange
- constraint on the type;
-
- +08: An LG that locates the Type Descriptor SYSTEM.CHAR. There
- is no "upward compatible" type.
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 28
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
-
- 4.4.4.3.4.4 ENUMERATIONS
-
-
- This type Suffix is unusual and has the following format:
-
- +00: A double-word containing the LOWER bound of the subrange
- constraint on the type;
-
- +04: A double-word containing the UPPER bound of the subrange
- constraint on the type;
-
- +08: An LG that locates the Prefix of the current Type
- Descriptor. There is no upward compatible type.
-
- What follows is a full-fledged SET Type Descriptor whose base type is
- the Type Descriptor of the Enumerated Type itself. The author has not
- yet discovered the reason for this.
-
- At least one case has been observed where a set type descriptor is
- followed by a word containing zero but I know of no explanation.
- Could this be a (shudder) BUG in Turbo?
-
-
- 4.4.4.3.5 SUBPROGRAM TYPES
-
-
- The length of this Suffix is variable. The format is as follows:
-
- +00: An LG that locates the Type Descriptor of the FUNCTION
- result returned by the Subprogram. This field is zero if
- the Subprogram is a PROCEDURE.
-
- +04: A Word that contains the number of Formal Parameters in
- the Function/Procedure header. If non-zero, then this
- word is followed by the parameter list itself as a simple
- array of parameter descriptors.
-
- The format of a parameter descriptor is as follows:
-
- 0000: An LG that locates the Type Descriptor of the
- corresponding parameter;
-
- 0004: A Byte that identifies the parameter passing
- mechanism used for this entry as follows:
-
- 02h -> VALUE of parameter is passed on STACK,
- 06h -> ADDRESS of parameter is passed on STACK.
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 29
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
-
- 5. MAPS AND LISTS
-
-
- The "MAPS and LISTS" are not part of the symbol dictionary. Rather,
- these structures provide access to the Code and Data Segments produced
- by the compiler or included via the {$L name.OBJ} directive. The
- format and purpose (as understood by this author) of each of these
- tables is explained in the following sections.
-
-
- 5.1 PROC MAP
-
-
- The PROC Map provides a means of associating the various Function and
- Procedure declarations with the Code Segments. There is some evidence
- that the Compiler produces CODE (and DATA) Segments for EACH of the
- Subprograms defined in the Unit as well as for the un-named Unit
- Initialization code block. There is also evidence that EXTERNAL PROCs
- must be assembled separately in order to exploit fully the Turbo
- "Smart Linker" since Turbo Pascal places some significant restrictions
- on EXTERNAL routines in the area of Segment Names and Types.
- Specifically, only code segments named "CODE" and data segments named
- "DATA" or "CONST" will be used by the "Smart Linker" as sources of
- code and data for inclusion in a Turbo Pascal .EXE file. (Turbo 6.0
- relaxed Name constraints but only one code segment per .OBJ remains a
- limitation).
-
- The first entry in the PROC Map is reserved for Unit Initialization
- block. If there is no Unit Initialization block, this entry will be
- filled with $FF. In addition, each and every PROC in the Unit has an
- entry in this table.
-
- If an EXTERNAL routine is included, then ALL PUBLIC PROC definitions
- in that routine must be declared in the Unit Source Code with the
- EXTERNAL attribute.
-
- The size of the PROC Map Table (in Bytes) is implied in the Unit
- Header by the LL's that occur at offsets +0C and +0E.
-
- The Format of a single PROC Map Entry is as follows:
-
- +00: A Word presumably reserved as a work area; always zero.
-
- +02: A Word presumably reserved as a work area; always zero.
-
- +04: A Word that contains an offset within the CSeg Map. This
- is used to locate the code segment containing the PROC.
-
- +06: A Word that contains an offset within the CODE Segment
- that defines the PROC entry point relative to the load
- point of the referenced CODE Segment.
-
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 30
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- 5.2 CSEG MAP
-
-
- The CSeg Map provides a convenient descriptor table for each CODE
- Segment present in the Unit and serves to relate these segments with
- the Segment Relocation Data and the Segment Trace Table. It seems
- reasonable to infer that the "Smart Linker" is able to include/exclude
- code/data at the SEGMENT level only.
-
- The CSeg Map is an array of fixed-length records whose format is as
- follows:
-
- +00: A Word apparently reserved for use by TURBO.
-
- +02: A Word that contains the Segment Length (in bytes).
-
- +04: A Word that contains the Length of the Fix-Up Data Table
- for this Code Segment (in bytes).
-
- +06: A Word that contains the offset of the Trace Table Entry
- for this Segment (if it was compiled with DEBUG Support).
- If there is no Trace Table for this segment, then this
- Word contains FFFFh.
-
-
- 5.3 TYPED CONST DSEG MAP
-
-
- The CONST DSeg Map provides a convenient descriptor table for each
- DATA Segment which was spawned by the presence of Typed Constants or
- VMT's in the Pascal Code. It serves to relate these segments with the
- Segment Fix-Up (relocation) Data and with the Code Segments that refer
- to these DATA elements. One entry is present for each CONST
- declaration part containing typed constants and for each CONST segment
- linked from an ".OBJ" file. The CONST DSeg Map is an array of fixed-
- length records whose format is as follows:
-
- +00: A Word apparently reserved for use by TURBO.
-
- +02: A Word that contains the Segment Length (in bytes).
-
- +04: A Word that contains the Length of the Fix-Up Data Table
- for this DATA Segment (in bytes).
-
- +06: A Word that contains an LL which locates the OBJECT that
- owns this VMT template or zero if the segment is not a VMT
- template.
-
- One can determine the defining block for a Typed Constant declaration
- and our program attempts to do just that. A by-product of the
- dictionary mapping algorithm allows the declaring block to be found
- and its qualified name printed. This information is also used to
- explain fix-up data as to its source. Results will be incomplete
- unless a really comprehensive dictionary is present in the unit.
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 31
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
-
- 5.4 GLOBAL VAR DSEG MAP
-
-
- The VAR DSeg Map provides a convenient descriptor table for each DATA
- Segment present in the Unit.
-
- One entry exists for each CODE segment which refers to GLOBAL VAR's
- allocated in the DATA Segment. These references may be seen in the
- Fix-Up Data Table. Each EXTERNAL CSeg having a segment named DATA
- also spawns an entry in this table. Only the Code Segments that meet
- these criteria cause entries to be generated in the VAR Dseg Map.
-
- The VAR DSeg Map is an array of fixed-length records whose format is
- as follows:
-
- +00: A Word apparently reserved for use by TURBO.
-
- +02: A Word that contains the Segment Length (in bytes). This
- may be zero, especially if the EXTERNAL routine contains a
- DATA segment whose sole purpose is to declare one or more
- EXTRN symbols that are defined in some DATA segment
- external to the Assembly.
-
- +04: A Word apparently reserved for use by TURBO.
-
- +06: A Word apparently reserved for use by TURBO.
-
- One can determine the defining block for a Global VARiable declaration
- and our program attempts to do just that. A by-product of the
- dictionary mapping algorithm allows the declaring block to be found
- and its qualified name printed. This information is also used to
- explain fix-up data as to its source. Results will be incomplete
- unless a really comprehensive dictionary is present in the unit. Such
- DSegs can be referenced by many CSegs and we only locate the first
- one. This is okay for Pascal code but it's ambiguous for assembler
- since the names may be PUBLIC and referenced by more than one module.
-
-
- 5.5 DONOR UNIT LIST
-
-
- This list contains an entry for each Unit (taken from the "USES" list)
- which MAY contribute either CODE or DATA to the executable file. Not
- all units do make such a contribution as some exist merely to define a
- collection of Types, etc. A Unit gets into this list if there exists
- a single Fix-Up Data Entry that references CODE or DATA in that Unit.
-
- The list is comprised of elements whose SIZE is variable and whose
- format is as follows:
-
- +00: A WORD apparently reserved for use by TURBO.
-
- +02: A variable-length String containing the unit name.
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 32
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
-
- 5.6 SOURCE FILE LIST
-
-
- This list contains an entry for each "source" file used to compile the
- Unit. This includes the Primary Pascal file, files containing Pascal
- code included by means of the {$I filename.xxx} compiler directive,
- and .OBJ files included by the {$L filename.OBJ} compiler directive.
-
- The order of entries in this list is critical since it maps the CODE
- segments stored in the unit. The order of the entries is as follows:
-
- 1) The Primary Pascal file;
-
- 2) All Included Pascal files;
-
- 3) All Included .OBJ files.
-
- Mapping of CSegs to files is done as follows:
-
- a) Each .OBJ file contributes a SINGLE Code Segment (if any).
- Note that this author has not observed an .OBJ module that
- contains only a DATA Segment (but that seems a distinct
- possibility).
-
- b) The Primary Pascal file (augmented by all included Pascal
- Files) contributes zero or more CODE Segments.
-
- Therefore, there are at least as many CSeg entries as .OBJ files. If
- more, then the excess entries (those at the front of the list) belong
- to the Pascal files that make up the Pascal source for the unit.
-
- The format of an entry in this list is as follows:
-
- +00: A flag byte that indicates the type of file represented;
-
- 04h -> the Primary Pascal Source File,
- 03h -> an Included Pascal Source File,
- 05h -> an .OBJ file that contains a CODE segment.
-
- +01: A Word apparently reserved for use by the Compiler/Linker.
-
- +03: A Word that is zero for .OBJ files and which contains the
- file directory time-stamp for Pascal Files.
-
- +05: A Word that is zero for .OBJ files and which contains the
- file directory date-stamp for Pascal Files.
-
- +07: A variable-sized string containing the filename and
- extension of the file used during compilation.
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 33
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- 5.7 DEBUG TRACE TABLE
-
-
- If Debug support was selected at compile time, then all Pascal code
- which supports Debugging produces an entry in this table. The table
- entries themselves are variable in size and have the following format:
-
- +00: A Word which contains an LL that locates the Directory
- Header of the Symbol (a PROC name) this entry represents.
-
- +02: A Word which contains the offset (within the Source File
- List) of the entry that names the file that generated the
- CSeg being traced. This allows the file included by means
- of the {$I filename} directive to be identified for DEBUG
- purposes, as well as code produced from the Primary File.
-
- +04: A Word containing the number of bytes of data that precede
- the BEGIN statement code in the segment. For Pascal PROCS
- these bytes consist of literal constants, un-typed
- constants, and other data such as range-checking limits,
- etc.
-
- +06: A Word containing the Line Number of the BEGIN statement
- for the PROC.
-
- +08: A Word containing the number of lines of Source Code to
- Trace in this Segment.
-
- +0A: An array of bytes whose size is at least the number of
- source code lines in the PROC. Each byte contains the
- number of bytes of object code in the corresponding source
- line. This appears to be an array of SHORTINT since if a
- "line" contains more than 127 bytes, then a single byte of
- $80 precedes the actual byte count as a sort of "escape"
- and the next byte records the up to 255 bytes for the
- line. This situation has not yet been fully explored. We
- do not yet know what happens in the event a line is
- credited with spawning more than 255 bytes of code.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 34
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- 6. CODE, DATA, FIX-UP INFO
-
-
- This area begins at the start of the next free PARAGRAPH. This means
- that its offset from the beginning of the Unit ALWAYS ends in the
- digit zero.
-
- This area contains the CODE segments, CONST DATA segments, and the
- Relocation (Fix-Up) Data required for linking.
-
-
- 6.1 OBJECT CSEGS
-
-
- Each CODE segment included in the unit appears here as specified by
- the CSeg Map Table. Depending on usage, these segments may appear in
- the executable file. There are no filler bytes between segments.
-
-
- 6.2 CONST DSEGS
-
-
- This section begins at the start of the first free PARAGRAPH following
- the end of the Object CSegs. This means that its offset from the
- beginning of the Unit ALWAYS ends in the digit zero.
-
- A DATA segment fragment appears here for each CSeg that declares a
- typed constant, and for each OBJECT which employs Virtual Methods,
- Constructors or Destructors. There are no filler bytes between
- segments.
-
- If local symbols were generated, there is always enough information to
- allow documenting the scope of the declaration as well as interpreting
- the data in the display since the needed type declarations would also
- be available. Our program merely identifies the defining block.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 35
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- 6.3 FIX-UP DATA TABLE
-
-
- This table begins at the start of the first free PARAGRAPH following
- the end of the CONST DSegs. This means that its offset from the
- beginning of the Unit ALWAYS ends in the digit zero. There are two
- sections in this table: one for code, and one for data. Both
- sections are aligned on paragraph boundaries. This may result in a
- "slack" entry between the code and data sub-sections, but this entry
- is included in the byte tally for the section stored in the Unit
- Header Table at UHZFA (offset +22).
-
- The table begins with entries for the CSeg Map and ends with entries
- for the CONST DSeg Map. The appropriate Map entry specifies the
- number of bytes of Relocation Data for the corresponding segment.
- This number may be zero in which case there is no Relocation Data for
- the given segment.
-
- The Table consists of an array of eight (8) byte entries whose format
- is as follows:
-
- +00: A Byte containing the offset within the Donor Unit List of
- the Unit name that this entry refers to. This can be the
- compiled Unit or some previously compiled external unit.
-
- +01: A Byte of BIT switches that identify the type of reference
- and the size of the needed fix-up (WORD or DWORD). A lot
- of guess-work led to the following interpretation:
-
- 7654 (bits 3-0 don't seem to be used)
-
- 00-- Locate item via a PROC Map,
- 01-- Locate item via a CSeg Map,
- 10-- Locate item via a Global VAR DSeg Map,
- 11-- Locate item via a Const DSeg Map,
- --00 WORD offset has NO effective address adjustment,
- --01 WORD offset HAS an effective address adjustment,
- --10 WORD SEGMENT-Only fix-up (address of some PUBLIC
- segment),
- --11 DWORD (FAR) pointer; possible effective address
- adjustment.
-
- +02: A Word containing the offset within the Map table
- referenced according to the above code scheme.
-
- +04: A Word containing an offset within the target segment
- which will be added to the effective address. For
- example, a reference to the VAR DSeg Map will require a
- final offset to locate the item (variable) within the DATA
- SEGMENT being referenced here. This may also be needed
- for references to LITERAL DATA embedded in a CODE SEGMENT.
-
- +06: A Word containing the offset within the CODE or DATA
- segment owning this entry that contains the area to be
- patched with the value of the final effective address.
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 36
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
-
- 7. SUPPLIED PROGRAM
-
-
- In order that the above information be made constructively useful, the
- author has designed a program that automates the process of discovery.
- It is not a "handsome" program and it is not a work of art. It does
- give useful results provided your PC has enough available memory.
-
- It should be obvious that the program was not designed "top-down".
- Rather, it just evolved as each new discovery was made. Later on, it
- seemed reasonable to try to document some of the relations between the
- various lists and tables and the program tries to make some of these
- relations clear, albeit with varying degrees of success.
-
-
- 7.1 TPU6
-
-
- This is the main program. It will ask for the name of the unit to be
- documented. Reply with the unit name only. The program will append
- the ".TPU" extension and will search for the proper file. It will
- also search TURBO.TPL if necessary.
-
- The program will then ask if Dis-Assembly is desired and will require
- a "y" or "n" answer. If "y", it also asks about the CPU.
-
- The current directory will be searched first, followed by all
- directories in the current PATH. If the .TPU file is not found, the
- program will search for it in the "TURBO.TPL" (Turbo Pascal Library)
- file. Units in the "USES" list(s) will also be loaded to enable
- resolution of LG items.
-
- If the desired unit is found, the program will write a report to the
- current directory named "unitname.lst" which contains its analysis.
- The format of the report is such that it may be copied to a printer if
- that printer supports TTY control codes with form-feeds. Be judicious
- in doing this however since there can be a lot of information. The
- Turbo SYSTEM.TPU unit file produces almost ninety (90) pages without
- the disassembly option. When disassembly is requested for the SYSTEM
- unit, the size of the output file exceeds 700K bytes.
-
-
- 7.1.1 UNIT TPU6AMS
-
-
- This Unit contains all Type Definitions, Structures, and primitive
- Functions and Procedures required by the program. All structures
- documented in this report are also documented in TPU6AMS by means of
- the TYPE mechanism. Some of the structures are difficult if not
- impossible to handle using ISO Pascal but Turbo Pascal provides the
- means for getting the job done.
-
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 37
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- 7.1.2 UNIT TPU6EQU
-
-
- This Unit is new and contains constants and types of general utility
- that are not strictly unit related. It also constains the pointer
- manipulation routines that are sensitive to the particular version of
- Turbo Pascal Version 6.0. It also contains a Heap Error Function that
- keeps track of the high-water mark of Heap Utilization of any program
- that uses it. This function gets installed automatically.
-
-
- 7.1.3 UNIT TPU6UTL
-
-
- This Unit is new. It contains the higher-level analysis algorithms
- formerly located in the main program and in TPU6AMS. The algorithms
- have been re-cast with object-orientation in mind and have potential
- for re-use in other contexts. The unit computes a cover for the
- dictionary and deduces relationships between dictionary, code, data
- and the CSeg, PROC, CONST and VAR Maps discussed in section 5. This
- information is retrieved by the main program to drive the printing
- process.
-
- This Unit also loads all units specified in the USES list of the prime
- unit to allow the names of externally defined types to be recovered on
- the report. Array bounds are also retrieved in this way. The code
- will search for needed units in TURBO.TPL without intervention. Close
- attention is paid to Heap Management and minimal utilization of Heap
- storage. The dictionary areas of the Units located in the USES list
- get loaded into the Heap at no extra charge. Nothing but the
- dictionary area is of any use at this point. The name and fully-
- qualified file name of each unit successfully loaded are printed at
- the top of the listing. Unit version numbers must agree or the unit
- will not be loaded. Dictionary covers are computed for each loaded
- unit to aid in rapid LG-resolution.
-
-
- 7.1.4 UNIT TPU6RPT
-
-
- This is a Unit that contains the text-file output primitives required
- by the main program. It's not very pretty but it does work.
-
-
- 7.1.5 UNIT TPU6UNA
-
-
- This unit is a rudimentary disassembler. The output will not assemble
- and may look strange to a "real" assembler programmer since I am not
- well-qualified in this area. However, the basis for support of 80286,
- 80386 etc. processors is present as well as coprocessor support. Of
- perhaps the greatest interest is that it does appear to decode the
- emulated coprocessor instructions that are implemented via INT 34-3D.
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 38
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- Be warned however. The output is not guaranteed since this was coded
- by myself and I am perhaps the rankest amateur that ever approached
- this quite awful assembler language. For convenience, the operand
- coding mimics TASM "Ideal" mode.
-
- As is usual with programs of this type, error-recovery is minimal and
- no context checking is performed. If the operation code is found to
- be valid, then a valid instruction is assumed -- even if invalid
- operands are present.
-
- The only positives that apply to this program are that it doesn't slow
- the cpu down (although a lot more output is produced), and it does let
- one "tune" code for compactness by letting one view the results of the
- coding directly. Also, incomplete instructions are handled as data
- rather than overrunning into the next proc.
-
-
- 7.2 MODIFICATIONS
-
-
- It was intended from the beginning that this program should be able to
- be enhanced to permit external units to be referenced during the
- analysis of any given unit, even if they were library components.
- Since the original release of this document, the program has been so-
- enhanced.
-
- This program was NOT intended as a pilot for some future product. It
- WAS intended as a rather "ersatz" tool for myself.
-
-
- 7.3 NOTES ON PROGRAM LOGIC
-
-
- The following sections discuss a few of the methods employed by the
- supplied program.
-
-
- 7.3.1 FORMATTING THE DICTIONARY
-
-
- Printing the unit dictionary area in a way that exposes its underlying
- semantics is no small task. The unit dictionary area itself is a
- rather amorphous-looking mass of data composed of hash tables,
- dictionary headers and stubs, type descriptors, etc. In order to
- present all this information in a meaningful way, we have to reveal
- its structure and this cannot be done by means of a sequential
- "browse" technique. Rather, we have to visit all nodes in the
- dictionary area so that each may be formatted in a way that exposes
- their function and meaning. This is made necessary by the fact that
- items are added to the dictionary as encountered and no convenient
- ordering of entry types exists. What we have here is the problem of
- finding a minimal "cover" for the dictionary area that properly
- exposes the content and structure of the dictionary area.
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 39
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- To do this, we construct (in the heap) a stack and a queue, both of
- which are initially empty. The entries we put in the stack identify
- the class of entry (Hash Table, Dictionary Header, Type Descriptor or
- In-Line Code group), the location of the structure, and the location
- of its immediate "owner" or "parent" dictionary entry (which allows
- some limited information about scope to be printed).
-
- To the empty stack, we add an entry for the unit name dictionary
- entry, the INTERFACE hash table, and the Debug hash table. All these
- are located via direct pointers (LL's) in the Unit Header Table. We
- then pop one entry off the stack and begin our analysis.
-
- a) If the entry we popped off the stack is not present in the
- queue, we add it and call a routine that can interpret the entry
- (aka, "cover") for a Dictionary Header, Hash Table, or Type
- Descriptor. (This may lead to additional entries being added to
- the stack such as nested-scope hash tables, Dictionary Headers,
- Type Descriptors or In-Line Code group entries.)
-
- b) While the stack is not empty, we pop another entry and repeat
- step "a" (above) until no more entries are available.
-
- The result is a queue containing one entry for each structure in the
- unit dictionary area that is identifiable via traversal. (In
- practice, the method we use is similar to a "breadth-first" traversal
- of an n-way tree that is implemented in non-recursive fashion.) Each
- entry in the queue contains the information described above and the
- queue itself thus forms a set of descriptors that drive the process of
- formatting the dictionary area for display. The process may be
- likened to "painting by the numbers" or to finding a way to lay tile
- on a flat surface using tiles of four different irregular shapes until
- the floor is exactly covered.
-
- There is one significant limitation that needs to be pointed out. It
- is not always possible to determine the "parent" or "owner" of a node
- with certainty. The following discussion illustrates the problem of
- finding the "real" parent of a Type Descriptor.
-
- Almost every "type" in Turbo Pascal is actually derived from the basic
- types that are defined in the SYSTEM.TPU unit -- e.g. "INTEGER",
- "BYTE", etc. In addition, several of the Type Descriptors in the
- SYSTEM unit are referenced by more than one Dictionary Entry. Thus,
- we find that a "many-to-one" relationship may exist between Dictionary
- Entries and Type Descriptors. How does one find out which is the
- entry that actually gave rise to the Type Descriptor?
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 40
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- The Dictionary Area of a unit has some special properties, one of
- which is the fact that the Dictionary Entries for named Types are
- often located quite near their primary type descriptors. The
- Dictionary Area seems to be treated as an upward growing heap with the
- various structures being added by Turbo as encountered. This makes it
- likely that the Type "Q" header which gives rise to a type descriptor
- is quite likely to occur earlier in the Dictionary Area than any other
- header which refers to the same descriptor. We take advantage of this
- property to allocate "ownership" but it may not be "fool-proof". Some
- type descriptors are spawned by other type descriptors, especially for
- structured types. We don't attempt to allocate "ownership" to these
- "lower-level" descriptors but we do try to keep track of scope
- information.
-
- A useful by-product of the above process is the ability to discover
- many of the associations between Global Variables, Typed CONST's,
- VMT's and the blocks in which they are declared or defined.
-
-
- 7.3.2 THE DISASSEMBLER
-
-
- To start with, I apologize up front for mistakes which are bound to be
- present in this routine. I am not really a MASM or TASM programmer
- and I will not pretend otherwise. This being the case, the formatting
- I have chosen for the operands may be erroneous or misleading and
- might (if submitted to one of the "real" assemblers) produce object
- code quite different from what is expected. I hope not, but I have to
- admit it's possible.
-
- My intention in adding this unit was to support hand-tuning of object
- code. With practice and some effort, one can observe the effect on
- the object module caused by specific Pascal coding. Thus, where
- compactness or speed is an issue of paramount importance, TPU6UNA can
- be of help. In some cases, a simple re-arrangement of the local
- variable declarations in a procedure can have a significant effect on
- the size of the code if it means the difference between 1 and 2-byte
- displacements for each instruction that references a specific local
- variable. Potential applications along these lines seem almost
- unlimited.
-
- I adopted an operand format not unlike that of TASM "Ideal" mode since
- it was more convenient to do so and looked more readable to me. I
- relied on several reference books for guidance in decoding the entire
- mess and I found that there were several flaws (read ERRORS) in some
- of them which made the job that much more difficult. I then
- compounded my problems by attempting to handle 80386 specific code
- even though Turbo Pascal does not yet generate code specific to these
- processors. I simply felt that the effort involved in writing any
- sort of Dis-Assembly program for Turbo Pascal units was an effort best
- experienced not more than once. With all this self-flagellation out
- of my system once and for all, I will try to show the basic strategy
- of the program and to explain the limitations and some of the
- discoveries I made.
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 41
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- The routine is intended to be idiotically simple - i.e., no smarter
- than the DEBUG command in principle. The basic idea is: pass some
- text to the routine and get back ONE line derived from some prefix of
- that text. Repeat as necessary until all text is gone. Thus, there
- is no attempt to check the context of the text being processed. Also,
- some configurations of the "modR/M" byte may invalid for selected
- instructions. I don't try to screen these out since the intent was to
- look at the presumably correct code produced by TURBO Pascal -- not
- devious assembly language. Also, this program regards WAIT operations
- as "stand-alone" -- i.e., it doesn't check to see if a coprocessor
- operation follows for which the WAIT might be regarded as a prefix.
-
- One area of real difficulty was figuring out the Floating-Point
- emulations used by Turbo Pascal that are implemented by means of
- interrupts $34 through $3D. I don't know if I got it right, but the
- results seem reasonable and consistent. In the listing, the Interrupt
- is produced on one line, followed by its parameters on the next line.
- The parameter line is given the op-code "EMU_xxxx" where "xxxx" is the
- coprocessor op-code I felt was being emulated. Interrupt $3C was a
- real puzzler but after seeing a lot of code in context, I think that
- the segment override is communicated to the emulator by means of the
- first byte after the $3C.
-
- Normally, in a non-emulator environment, all coprocessor operations
- (ignoring any WAIT prefixes) begin with $D8-$DF. What Borland (and
- maybe Microsoft) seem to have done here is to change the $D8-$DF so
- that bits 7 and 6 of this byte are replaced with the one's complement
- of the 2-bit segment register number found in various 8086
- instructions. This seems to be how an override for the DS register is
- passed to the emulator. I don't KNOW this to be the correct
- interpretation, but the code I have examined in context seems to work
- under this scheme, so TPU6UNA uses it to interpret the operand
- accordingly.
-
- For 80x86 machines, the problem was somewhat simpler. TPU6UNA takes a
- quick look at the first byte of the text. Almost any byte is valid as
- the initial byte of an instruction, but some instructions require more
- than one byte to hold the complete operation code. Thus, step 1
- classifies bytes in several ways that lead to efficient recognition of
- valid operation codes.
-
- Once the instruction has been identified in this way, it is more or
- less easy to link to supplemental information that provides operand
- editing guidance, etc.
-
- The tables that embody the recognition scheme were constructed using
- PARADOX 3.0 (another fine Borland product) and suitably coded queries
- were used to generate the actual Turbo Pascal code for compilation.
-
- For those that are interested, TPU6UNA supports the address-size and
- operand-size prefixes of the 80386 as well as 32-bit operands and
- addresses but remember that Turbo Pascal doesn't generate these. A
- trivial change is provided for which allows segments which default to
- 32-bit mode to be handled as well.
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 42
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- There is a simple mode variable that gets passed to TPU6UNA by its
- caller which specifies the most-capable processor whose code is to be
- handled. Codes are provided for the 8086 (8088 is the same), 80186
- (same as 80286 without protected mode instructions), 80286 (80186 plus
- protected mode), and 80386. You now get asked which one to use.
-
- No such specifier is provided for coprocessor support. What is there
- is what I think an 80387 supports. I don't think that this is really
- a problem if you don't try to use TPU6UNA for anything but Turbo
- Pascal code.
-
- Error recovery is predictably simple. The initial text byte is output
- as the operand of a DB pseudo-op and provision is made to resume work
- at the next byte of text.
-
- I hope this program is found to be useful in spite of the errors it
- must surely contain. I have yet to make much sense of the rules for
- MASM or TASM operand coding and I found very little of value in many
- of the so-called "texts" on the subject. I found myself in the
- position of that legendary American in England watching a Cricket
- match for the first time ("You mean it has RULES?").
-
-
- 8. UNIT LIBRARIES
-
-
- I have examined .TPL files in passing and feel that their structure is
- trivial. It's so easy to handle them that the program now routinely
- examines TURBO.TPL to resolve named types.
-
-
- 8.1 LIBRARY STRUCTURE
-
-
- A Turbo Pascal Library (.TPL) file is a simple catenation of Turbo
- Pascal Unit (.TPU) files. Since the length of a Unit may be
- determined from the Unit Header (see section 3.1), it is simple to see
- that one may "browse" through a .TPL file looking for an external unit
- such as SYSTEM.TPU. The supplied program does just that in its unit
- retrieval process so the TPUMOVER utility is no longer required for
- processing of units in TURBO.TPL
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 43
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- 9. APPLICATION NOTES
-
-
- One of the more obvious applications of this information would seem to
- be in the area of a Cross-Reference Generator.
-
- There is a very fine example of such a program in the public domain
- that was written by Mr. R. N. Wisan called "PXL". This program has
- been around since the days of Turbo Pascal Version 1. The program has
- been continually enhanced by the author in the way of features and for
- support of the newer Turbo Pascal versions. It does not however solve
- the problem of telling one which unit contains the definition of a
- given symbol. In fairness to "PXL" however, this is no small problem
- since the format of .TPU files keeps changing (Turbo 6.0 Units are
- not object-code compatible with Turbo 5.x Units, and so on...) and
- Mr. Wisan probably has more than enough other projects to keep himself
- occupied.
-
- However, for the user who is willing to work a little (maybe a lot?),
- this document would seem to provide the information needed to add such
- a function to his own pet cross-reference generator.
-
- Further, with SIGNIFICANTLY more effort, it should be possible to do
- much of the job of de-compilation -- provided the DEBUG dictionary is
- present. At the very least, most declarations should be recoverable.
- It's another thing entirely to try to reconstruct plausable TURBO
- Pascal code from the CSegs. This would be a formidable task and lots
- of knowledge about TURBO's code generators would have to be acquired.
- At present, the only way I know to get this information is to have the
- run-time library source codes and then work-work-work at testing code
- produced by the compiler for a huge number of test case units. You
- have to want to do this really badly in order to invest the time. I
- am not that tired of living.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 44
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- 10. ACKNOWLEDGEMENTS
-
-
- This project would have been totally infeasible without the aid of
- some very fine tools. As it was, several hundred man hours have been
- expended on it and as you can see, there are a few unresolved issues
- that have been (graciously) left for others to address. The tools
- used by this author consisted of:
-
- 1) Turbo Pascal 6.0 Professional by Borland International
-
- 2) Microsoft WORD (version 5.0)
-
- 3) LIST (version 7.5) by Vernon D. Buerg
-
- 4) the DEBUG utility in MS-DOS Version 3.3.
-
- 5) PARADOX 3.0 by Borland International
-
- 6) QUATTRO PRO by Borland International
-
- 7) TURBO ASSEMBLER 1.1 by Borland International
-
- (PARADOX and QUATTRO PRO were used for data collection and analysis in
- the course of coding the recognizer tables for the disassembler unit.)
-
- The references listed were of great value in this project. [Intel85]
- was a valuable source of information about coprocessor instructions as
- well as offering hints about the differences between the 8086/8088 and
- the 80286. The [Borland] TASM manuals offered further info on the
- 80186. [Nelson] provided presentations of well-organized data
- directed at the problem of disassembly but the tables were flawed by a
- number of errors which crept into my databases and which caused much
- of the extra debugging effort. [Intel89] offered valuable insights on
- the 80386 addressing schemes as well as the 32-bit data extensions.
- Finally, [Brown] provided valuable clues on the Floating-Point
- emulators used by Borland (and Microsoft?). As you can see, the
- amount of hard information available to me on this project was quite
- limited since I am unaware of any other existing body of literature on
- this subject.
-
- That's it folks. Does anyone wonder why it took several hundred man
- hours to get to this point? It took a lot of hard (and at times
- tedious) work coupled with a great many lucky guesses to achieve what
- you see here.
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 45
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- 11. REFERENCES
-
-
- [Borland], TURBO ASSEMBLER REFERENCE GUIDE, Borland International,
- 1988.
-
- [Borland], TURBO ASSEMBLER USER'S GUIDE, Borland International, 1988.
-
- [Borland] TURBO PASCAL 6.0 PROGRAMMING GUIDE, Borland International,
- 1990.
-
- [Borland] TURBO PASCAL LIBRARY REFERENCE Version 6.0, Borland
- International, 1990.
-
- [Borland] TURBO PASCAL USER'S GUIDE Version 6.0, Borland
- International, 1990.
-
- [Brown], INTER191.ARC, Ralf Brown, 1991
-
- [Intel85], iAPX 286 PROGRAMMER'S REFERENCE MANUAL INCLUDING THE iAPX
- 286 NUMERIC SUPPLEMENT, Intel Corporation, 1985, (order
- number 210498-003).
-
- [Intel89], 386 SX MICROPROCESSOR PROGRAMMER'S REFERENCE MANUAL, Intel
- Corporation, 1989, (order number 240331-001).
-
- [Nelson] THE 80386 BOOK: ASSEMBLY LANGUAGE PROGRAMMER'S GUIDE FOR
- THE 80386, Ross P. Nelson, Microsoft Press, 1988.
-
- [Scanlon], 80286 ASSEMBLY LANGUAGE ON MS-DOS COMPUTERS, Leo J.
- Scanlon, Brady 1986.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 46
-
-
-
- Inside TURBO Pascal 6.0 Units
- ----------------------------------------------------------------------
-
- INDEX
-
-
- .OBJ file 12, 13, 30, 31, 33
- .TPL file 6, 14, 37, 38, 43
- .TPU
- file 5, 7, 11, 14, 23, 37, 43, 44
- size 14
- SYSTEM 6, 16, 17, 18, 23, 37, 40, 43
- Assembler 6
- Attribute
- ABSOLUTE 7
- EXTERNAL 20, 30
- Call Model
- ASSEMBLER 20
- FAR 20
- INLINE 20
- INTERRUPT 20
- CONST 6, 11, 12, 13, 19, 24, 26, 31, 35, 36, 38
- Constraint 28, 29
- CSeg 6, 11, 12, 30, 31, 32, 33, 34, 35, 36, 38
- Defining block 31, 32
- Directive 12, 13, 14, 20, 30, 33, 34
- External 7, 30, 32, 36, 39, 43
- Hash 11, 12, 13, 14, 15, 16, 17, 20, 25, 26, 39, 40
- Include 33, 34
- Interface 6, 11, 12, 13, 14, 15, 16, 17, 22, 40
- Locator
- LG 7, 10, 18, 19, 21, 23, 25, 26, 27, 28, 29
- LL 7, 11, 16, 22, 30, 40
- offset 7, 9, 10, 19, 20, 26, 30, 31, 34, 35, 36
- Method 20
- CONSTRUCTOR 20
- DESTRUCTOR 20
- Self 19
- Operand offset 36
- Parameter 18, 19, 20, 21, 29
- PROC 6, 11, 12, 20, 30, 34, 36, 38, 39
- SEGMENT 36
- Signature 5, 22
- Stub 7, 17, 18, 19
- Type Descriptor 18, 19, 21, 23, 25, 26, 27, 28, 29, 40, 41
- VAR 32, 38
- VMT 12, 13, 20, 26, 31
-
-
-
-
-
-
-
-
-
-
-
-
- ----------------------------------------------------------------------
- Rev: April 16, 1991 Page 47